perf: implement batch processing in iterateEvalTree by cheb0 · Pull Request #406 · ozontech/seq-db

cheb0 · 2026-04-22T11:54:11Z

Description

Continuation of #390

iterateEvalTree works with batches of lids, requests batches of mids and rids
fixes stopwatch measurements for get_mid step
array based hist map is decoupled into it's own struct

I did some measurements for both patches (this combined with #390) vs main (used bitpack encoding in both branches). For small ordinary searches there is no benefit. For dense analytic queries there is a decent improvement.

For our k6 benchmark seq-db-hist.js: 2.3 sec => 650 ms
For seq-db-aggs.js: 6.1 sec => 4.7 sec
Hist over _all_ (warm query) (3 prod fractions): ~37 ms => ~15 ms

Part of #329

I have read and followed all requirements in CONTRIBUTING.md;
I used LLM/AI assistance to make this pull request;

cheb0 · 2026-04-22T11:56:09Z

@seqbenchbot up main search-keyword-exact-match-warm

seqbenchbot · 2026-04-22T11:56:11Z

Nice, @cheb0 <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - e8eefca9.

Here is a list of helpful links:

Take a look at Grafana dashboard;
Live-tailing logs are also available;

Have a great time!

codecov-commenter · 2026-04-22T11:58:09Z

Codecov Report

❌ Patch coverage is 78.76712% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.58%. Comparing base (da8604a) to head (74748b8).

Files with missing lines	Patch %	Lines
frac/processor/search.go	71.26%	22 Missing and 3 partials ⚠️
frac/sealed/seqids/provider.go	70.00%	2 Missing and 1 partial ⚠️
frac/sealed_index.go	72.72%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@                Coverage Diff                 @@
##           329-batching-1     #406      +/-   ##
==================================================
- Coverage           71.54%   70.58%   -0.97%     
==================================================
  Files                 220      221       +1     
  Lines               16568    20423    +3855     
==================================================
+ Hits                11854    14415    +2561     
- Misses               3840     5128    +1288     
- Partials              874      880       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cheb0 · 2026-04-22T12:12:21Z

@seqbenchbot down e8eefca9

seqbenchbot · 2026-04-22T12:12:29Z

Nice, @cheb0 <(-^,^-)=b!

The benchmark with identificator e8eefca9 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary

Query	Type	`mean (ms)`			`stddev (ms)`			`p(50) (ms)`			`p(95) (ms)`			`p(99) (ms)`			`iterations`
Query	Type	base	comp	diff	base	comp	diff	base	comp	diff	base	comp	diff	base	comp	diff	base	comp	diff
`bulk`	warm	`65.92`	`67.00`	`+1.65%`	`25.11`	`26.77`	`+6.60%`	`58.00`	`60.00`	`+3.45%`	`118.00`	`124.00`	`+5.08%`	`157.50`	`165.00`	`+4.76%`	`2450.00`	`2450.00`	`0.00%`
`service:payment-backend-eu AND k8s_namespace:prod`	warm	`130.57`	`129.30`	`-0.97%`	`115.82`	`112.04`	`-3.26%`	`114.00`	`115.00`	`+0.88%`	`324.00`	`319.00`	`-1.54%`	`669.50`	`642.50`	`-4.03%`	`8339.00`	`8363.00`	`+0.29%`

Have a great time!

forshev · 2026-05-04T10:09:54Z

+	for _, lid := range lids {
+		rawLid := lid.Unpack()
+		blockIdx := p.table.GetIDBlockIndexByLID(rawLid)
+		if p.midCache.blockIndex != int(blockIdx) {


nit: fillMIDs has this check inside. did you add it to avoid function call?

forshev · 2026-05-04T12:09:10Z

+		// Get MIDs
+		if needMids > 0 {
+			timerMID.Start()
+			mids = idsIndex.GetMIDs(lidsSlice[0:needMids], mids[:0])


nit: technically we can omit the lower bound if it equals 0

lidsSlice[:needMids]

dkharms · 2026-05-12T12:42:59Z

 	return seq.MID(p.midCache.GetValByLID(uint32(lid))), nil
 }

+func (p *Provider) MIDs(lids []node.LID, out []seq.MID) ([]seq.MID, error) {


Why Provider has method for retrieving a batch of MID but there is no similar method for RID?

created RIDs method

dkharms · 2026-05-12T12:55:59Z

+	defer searchBuffersPool.Put(buffers)
+	mids := buffers.mids
+	rids := buffers.rids
+	lidsBuffer := buffers.lids


Shouldn't you reset buffers since slices are reused?

looks like no, since I do not reassign it

dkharms · 2026-05-12T13:00:09Z

+		lidsBuf := lidsBuf{
+			lids: make([]node.LID, 0, consts.LIDBlockCap),
+		}
+		return searchBuffers{


It's better to return a pointer here, otherwise there will be unnecessary allocations since any is returned.

dkharms · 2026-05-12T13:07:47Z

+	filterMIDs := sw.Timer("filter_mids")
+	updateHist := sw.Timer("update_hist")


Suggested change

filterMIDs := sw.Timer("filter_mids")

updateHist := sw.Timer("update_hist")

timerFilterMIDs := sw.Timer("filter_mids")

timerUpdateHist := sw.Timer("update_hist")

dkharms · 2026-05-12T13:11:45Z

Suggested change

LIDs(out []node.LID) []node.LID

dkharms · 2026-05-12T15:33:07Z

I'll leave it here since it is out of scope of this diff.

Take a look at https://github.com/ozontech/seq-db/blob/329-batching-iterate-eval-tree/frac/sealed/lids/iterator_desc.go#L121-L131 -- I guess you've introduced code duplication while performing rebase.

fixed, thanks

dkharms · 2026-05-12T15:37:49Z

+	return total, ids, hist, aggs, nil
+}
+
+func filterOutOfRangeMIDs(params SearchParams, mids []seq.MID, lidsSlice []node.LID) ([]seq.MID, []node.LID) {


I am not sure what purpose this function serves.

Per my understanding, we cannot iterate over seq.LID which correspond to seq.ID that lie outside of user-requested range [from; to] -- this is guaranteed because we calculate minLID and maxLID in getLIDsBorders and use those in all iterators to set boundaries.

Am I missing something?

This check is right from the original implementation, I have not added it: https://github.com/ozontech/seq-db/blob/main/frac/processor/search.go#L229.

The only thing it does is converting potential panic to error if minLID and maxLID deriving does not work somehow. And it's enabled only for histrograms. But it's more appropriate to panic in this case. Basically, it looks useless to me.

It seems I added the original check when I was refactoring the histogram from a map to a slice, just to make sure the new approach wouldn't go out of bounds (being cautious).

I suggest either removing this check now, or moving it inside hist.Update(mids). Not exactly that check, but rather a bounds check — ensuring we don't go out of slice bounds.

dkharms · 2026-05-12T15:42:47Z

+	buffers := searchBuffersPool.Get().(searchBuffers)
+	defer searchBuffersPool.Put(buffers)
+	mids := buffers.mids
+	rids := buffers.rids


Starting a petition to protect Vim users and their descendants — we require spaces. This is how we navigate code. Thank you for your cooperation.

Maybe something like?

var ( total int lastID seq.ID ids seq.IDSources ) buffers := searchBuffersPool.Get().(searchBuffers) defer searchBuffersPool.Put(buffers)

dkharms · 2026-05-12T15:50:48Z

 		}
 		// limit how much we drain from eval tree for one-by-one flow. ignored for batched flow
-		need = min(need, maxLidsToDrain)
+		needLids = min(needLids, maxLidsToDrain)


Maybe we can move this whole thing with calculating limits/offsets/etc to the batch? I mean something like:

if ok { evalTreeIter = func(need int, _ lidsBuf) LIDsIter { // batched flow: juts get a batch and return return batchNode.NextBatch().Trim(need) // Or return batchNode.NextBatch(need) } } else { ... } func (b LIDBatch) Trim(k int) LIDBatch { b.lids = b.lids[:min(k, len(b.lids))] return b }

I would prefer batchNode.NextBatch(need) but it needs more work to do.

eguguchkin · 2026-05-26T09:20:47Z


 		timerEval.Start()
-		lidBatch := evalTree(need, buf)
+		lidBatch := evalTree(needLids, lidsBuffer)


The current implementation of this function is overly confusing.

Issues:

The needLids parameter is unused in one branch, while lidsBuffer is unused in another branch.

The line lidsSlice := lidBatch.LIDs(lidsBuffer.lids) reuses the buffer, but one of the LIDsIter implementations (lidsBuf) ignores this parameter.

The logic is hard not only to read but even to explain verbally.

Proposed simplification:

Remove the LIDsIter interface.

Remove lidsBuf.

Extract the wrapper into a separate function, e.g., batcher.

The batcher function should directly return []node.LIDs.

Implement a reusable buffer inside that function using a closure.

This will make the code clearer and eliminate the confusion caused by unused arguments in different branches.

diff --git a/frac/processor/search.go b/frac/processor/search.go index c4836c3e..d7fe6e0a 100644 --- a/frac/processor/search.go +++ b/frac/processor/search.go @@ -44,26 +44,18 @@ type searchIndex interface { GetSkipLIDs(minLID, maxLID uint32, reverse bool) (node.Node, bool, error) } -type LIDsIter interface { - LIDs(out []node.LID) []node.LID - Len() int -} - type searchBuffers struct { - lids lidsBuf + lids []node.LID mids []seq.MID rids []seq.RID } var searchBuffersPool = sync.Pool{ New: func() any { - lidsBuf := lidsBuf{ - lids: make([]node.LID, 0, consts.LIDBlockCap), - } return &searchBuffers{ // Currently, we drain up to 4k lids from eval tree, but with proper batching enabled // we can get as much as whole LID block can have (currently, 64k lids) - lids: lidsBuf, + lids: make([]node.LID, 0, consts.LIDBlockCap), mids: make([]seq.MID, 0, consts.LIDBlockCap), rids: make([]seq.RID, 0, consts.LIDBlockCap), } @@ -142,30 +134,8 @@ func IndexSearch( m.Stop() } - var evalTreeIter func(need int, out lidsBuf) LIDsIter - batchNode, ok := tryConvertToBatchedTree(evalTree) - - if ok { - evalTreeIter = func(need int, _ lidsBuf) LIDsIter { - // batched flow: juts get a batch and return - return batchNode.NextBatch() - } - } else { - evalTreeIter = func(need int, buf lidsBuf) LIDsIter { - // iterator flow: buffer LIDs one by one and return a batch - for i := 0; i < need; i++ { - lid := evalTree.Next() - if lid.IsNull() { - break - } - buf = buf.append(lid) - } - return buf - } - } - m = sw.Start("iterate_eval_tree") - total, ids, histMap, aggs, err := iterateEvalTree(ctx, params, index, evalTreeIter, aggSupplier, sw) + total, ids, histMap, aggs, err := iterateEvalTree(ctx, params, index, evalTree, aggSupplier, sw) m.Stop() if err != nil { @@ -207,11 +177,36 @@ func IndexSearch( return qpr, nil } +func batcher(evalTree node.Node, buf []node.LID) func(need int) []node.LID { + if batchNode, ok := tryConvertToBatchedTree(evalTree); ok { + return func(need int) []node.LID { + buf = batchNode.NextBatch().LIDs(buf[:0]) + if len(buf) > need { + buf = buf[:need] + } + return buf + } + } + + return func(need int) []node.LID { + // iterator flow: buffer LIDs one by one and return a batch + buf = buf[:0] + for range min(maxLidsToDrain, need) { + lid := evalTree.Next() + if lid.IsNull() { + break + } + buf = append(buf, lid) + } + return buf + } +} + func iterateEvalTree( ctx context.Context, params SearchParams, idsIndex idsIndex, - evalTree func(need int, buf lidsBuf) LIDsIter, + evalTree node.Node, aggSupplier func() ([]Aggregator, error), sw *stopwatch.Stopwatch, ) (int, seq.IDSources, HistMap, []Aggregator, error) { @@ -233,7 +228,8 @@ func iterateEvalTree( defer searchBuffersPool.Put(buffers) mids := buffers.mids rids := buffers.rids - lidsBuffer := buffers.lids + + batchedEvalTree := batcher(evalTree, buffers.lids) timerEval := sw.Timer("eval_tree_next") timerMID := sw.Timer("get_mid") @@ -256,19 +252,15 @@ func iterateEvalTree( if needScanAllRange { needLids = math.MaxUint32 } - // limit how much we drain from eval tree for one-by-one flow. ignored for batched flow - needLids = min(needLids, maxLidsToDrain) timerEval.Start() - lidBatch := evalTree(needLids, lidsBuffer) + lidsSlice := batchedEvalTree(needLids) timerEval.Stop() - if lidBatch.Len() == 0 { + if len(lidsSlice) == 0 { break } - lidsSlice := lidBatch.LIDs(lidsBuffer.lids) - needMids := min(params.Limit-len(ids), len(lidsSlice)) if hasHist { // need to fetch mids for all lids for hist @@ -377,26 +369,6 @@ func filterOutOfRangeMIDs(params SearchParams, mids []seq.MID, lidsSlice []node. return mids, lidsSlice } -// lidsBuf maintains node.LID in slice as is (append order). -// Used to drain batches of LIDs when eval tree doesn't support batching. -type lidsBuf struct { - lids []node.LID -} - -func (b lidsBuf) append(x node.LID) lidsBuf { - return lidsBuf{ - lids: append(b.lids, x), - } -} - -func (b lidsBuf) Len() int { - return len(b.lids) -} - -func (b lidsBuf) LIDs(_ []node.LID) []node.LID { - return b.lids -} - func tryConvertToBatchedTree(evalTree node.Node) (node.BatchedNode, bool) { switch it := evalTree.(type) { case *lids.IteratorDesc:

eguguchkin · 2026-05-26T10:03:04Z

-					bucketIndex := uint64(mid)/uint64(histInterval) - histBase
-					histogram[bucketIndex]++
-				}
+		needMids := min(params.Limit-len(ids), len(lidsSlice))


nit: naming:

needMids -> needMIDs or midsNeeded

needLids -> needLIDs or lidsNeeded

needIds -> needIDs or idsNeeded

eguguchkin · 2026-05-26T12:47:58Z

 	return seq.MID(p.midCache.GetValByLID(uint32(lid))), nil
 }

+func (p *Provider) MIDs(lids []node.LID, out []seq.MID) ([]seq.MID, error) {


nit: The only place where MIDs is called is sealedIDsIndex.GetMIDs — and it's just a wrapper:

func (ii *sealedIDsIndex) GetMIDs(lidsBatch []node.LID, out []seq.MID) []seq.MID { mids, err := ii.provider.MIDs(lidsBatch, out) if err != nil { logger.Panic("get mids error", zap.String("frac", ii.fracName), zap.Int("lids_count", len(lidsBatch)), zap.Error(err)) } return mids }

Why not just implement this logic directly inside sealedIDsIndex.GetMIDs?

As it stands, it's unclear why we created a new package dependency — seqids on node (because of node.LID).

batch processing for iterateEvalTree

74748b8

cheb0 changed the base branch from main to 329-batching-1 April 22, 2026 11:55

cheb0 added 2 commits April 23, 2026 15:44

linter issues

afb477f

sync.pool for all buffers

b62a56b

cheb0 marked this pull request as ready for review April 23, 2026 14:19

eguguchkin requested review from dkharms and forshev April 27, 2026 11:03

forshev approved these changes May 4, 2026

View reviewed changes

cheb0 added the performance Features or improvements that positively affect seq-db performance label May 12, 2026

dkharms reviewed May 12, 2026

View reviewed changes

Merge branch '329-batching-1' into 329-batching-iterate-eval-tree

a92bb18

eguguchkin modified the milestones: v0.74.0, v0.72.0 May 18, 2026

code review fixes

a5e6ac8

eguguchkin reviewed May 26, 2026

View reviewed changes

		filterMIDs := sw.Timer("filter_mids")
		updateHist := sw.Timer("update_hist")

Conversation

cheb0 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

cheb0 commented Apr 22, 2026

Uh oh!

seqbenchbot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 22, 2026

Codecov Report

Uh oh!

cheb0 commented Apr 22, 2026

Uh oh!

seqbenchbot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eguguchkin May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkharms May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkharms May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eguguchkin May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cheb0 commented Apr 22, 2026 •

edited

Loading

seqbenchbot commented Apr 22, 2026 •

edited

Loading

seqbenchbot commented Apr 22, 2026 •

edited

Loading

eguguchkin May 26, 2026 •

edited

Loading

dkharms May 12, 2026 •

edited

Loading

dkharms May 12, 2026 •

edited

Loading

eguguchkin May 26, 2026 •

edited

Loading